---
output:
  pdf_document:
    fig_caption: yes
    citation_package: natbib
    toc_depth: 2
title: "Online Appendix: Personalizing Moral Reframing in Interpersonal Conversation"
author: 
- Joshua L. Kalla^[Yale University, josh.kalla@yale.edu]
- Adam Seth Levine^[Johns Hopkins University]
- David E. Broockman^[University of California, Berkeley]

bibliography: SMabortion.bib
biblio-style: plain
header-includes:
  - \usepackage{pdfpages}
  - \usepackage{caption} 
  - \usepackage{float}
  - \usepackage{comment}
---

\setcounter{table}{0}
\renewcommand\thetable{OA\arabic{table}}

\tableofcontents

\newpage

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

```{r load, echo=FALSE, message=FALSE, warning=FALSE, include = FALSE}
#Load Libraries
library(knitr)
library(pander)
library(bookdown)
library(captioner)
library(sandwich)
library(lmtest)
library(kableExtra)
library(ggplot2)
library(tidyverse)
library(plyr)
library(ebal)

#setwd("")
data <- read.csv("abortion_canvass_data.csv", stringsAsFactors = FALSE)
```

The full replication code and data that produces this report will be available at the JOP Data Archive on Dataverse.

This experiment was pre-registered at https://osf.io/rft8w/.

# Intervention Details

### Training

Before beginning the experiment we worked with the partner organization for 8 months (spread over two summers) to develop and pilot test the intervention and develop the training. The canvassers were all volunteers. They received training when they first started, including being paired with a more experienced canvasser for a period of time. Throughout the program they received ongoing training and feedback, as well as opportunities for reflection. The training focused on providing canvassers with the skills to listen to and ask questions of voters in a non-judgmental context. They learned about and practiced relationship-building techniques such as using openers (i.e. asking questions rather than lecturing), responsiveness (e.g. responding directly to what the voter said with relevant follow-up questions), and being affirming (i.e. respecting, and not challenging, what voters share). The training also focused on making them comfortable with having conversations about a sensitive topic with strangers. These trainings were all generic trainings on how to have a high-quality canvass conversation.

On abortion in particular, the canvassers received trainings in two specific skills. First, given our focus on personalized moral reframing, the training paid particular attention to Moral Foundations Theory and how canvassers could identify the moral values underlying the beliefs and experiences that voters share along with how to present new arguments that are tied to those values. For example, canvassers completed a values worksheet where they would read a particular statement a voter might say related to a certain moral frame and then discuss ways to respond to this. This worksheet focused on the values of care, fairness, loyalty, authority, sanctity, and liberty.

Second, canvassers also received extensive training related to abortion facts. For example, canvassers received facts about how many women who choose to have an abortion are already parents. 

Overall, trainings involved a combination of role play and viewing video of past canvass conversations.

### Canvasser Demographics

The canvassers for this project were volunteer canvassers recruited by the local partner organization. Below we summarize the demographics of the canvassers:

- The average canvasser was `r round(mean(data[data$canvassed == 1,]$canvasser_age, na.rm = TRUE))` years old. 
- `r round(mean(data[data$canvassed == 1,]$canvasser_female, na.rm = TRUE), 2)*100`% were female. 
- `r round(mean(data[data$canvassed == 1,]$canvasser_white, na.rm = TRUE), 2)*100`% were white. 
- `r round(mean(data[data$canvassed == 1,]$canvasser_children, na.rm = TRUE), 2)*100`% had children.
- `r round(mean(data[data$canvassed == 1,]$canvasser_abortion, na.rm = TRUE), 2)*100`% reported having had an abortion.

### Intervention Procedure

The canvassers were trained to follow the below procedure when approaching homes when subjects were in the treatment condition. Being mainly concerned with external validity, this procedure does not strictly rely on only one theoretical paradigm as is common in lab studies. However, the majority of the time in the training and in the conversations was spent on how to connect on values and non-judgmentally exchange narratives. The full scripts are reproduced below.

Canvassers themselves were not aware of the details of the experiment or the survey and nowhere in the conversation did they indicate that the effects of the conversation were being measured or part of the study.

The intervention had the following six steps.

#### 1. Establish contact

Determine if the voter is home. The canvasser knocks on the door and says, “Hi, are you (voter’s name)?” If the subject identifies him/herself, then the canvasser marks “voter came to door” on their walk list. This leads the voter to be targeted for resurveying. Note that this first step is identical in the placebo and treatment conditions.

#### 2. Collect initial ratings and follow-up questions to surface experiences, beliefs, and values

The canvassers began the intervention by engaging in a series of strategies to elicit participants’ opinions in a non-judgmental manner. First, they informed voters that they were talking with folks in their neighborhood about abortion, and acknowledged that it is not a topic that people frequently talk about. Canvassers then asked voters to report their opinion on how permissible abortion should be on a scale from 0-10 and then also asked them to reflect upon and explain their response. Canvassers were trained to probe responses to learn more about the reasons underlying these opinions, all while not indicating that they were pleased or displeased with any particular answer. Instead, the goal was for them to appear genuinely interested in hearing the voter reflect on the questions.

#### 3. Surface voters’ beliefs and attitudes

Afterwards, the canvasser asked a number of follow-up questions to further understand the experiences that shaped voters’ views on abortion, and in particular their previous experiences related to abortion and pregnancy. Canvassers would respond directly and ask for more information, and also ask voters to reflect upon how they/others felt during these experiences. Canvassers also listened carefully for the underlying moral values that the voter was expressing. For instance, voters might cite moral values such as loyalty (e.g., standing by one's friends even if you don't see eye to eye), fairness (e.g., the desire to treat others fairly), and care (e.g., caring about one's family and wanting to do right by them). Overall, the goal was to encourage effortful reflection and to build rapport, and also for the canvasser to learn about the underlying values tied to the voter’s beliefs and attitudes.

#### 4. Canvasser identifies values he/she and voter share

Next, the canvassers shared their own story, including their own personal experiences that informed their beliefs about abortion as well as underlying values as well. While they did that, they also explicitly acknowledged and named the values they share in common with the voter (e.g., "I know you also believe it's important to stand by your friends, even if you don't see eye to eye"). To facilitate these exchanges, canvassers received pre-canvass trainings on Moral Foundations Theory and moral reframing and prepared multiple stories they could tell, depending on which value matched their own experiences and appeared most salient to the voter. Identifying shared values was important for the canvasser to further establish trust and credibility, as someone who could speak on behalf of those values.

#### 5. Canvasser engages in moral reframing to make the case

Next, canvassers would shift the conversation from personal beliefs and values toward the topic of what the experience should be like for a woman who has decided to end a pregnancy. They would first ask voters to reflect on that question. Afterwards, the canvasser would share his/her main argument: that safe, legal abortion should be available for all women and that women should be supported in their decisions and not be judged at all. While doing so, the canvasser would tie this argument to the values shared in common with the voter. This was personalized moral reframing at work. For example, if the conversation had surfaced a shared moral value of fairness, then the canvasser might say "It is only fair that women are supported in their decisions..." Afterwards the canvasser would ask the voter to reflect on what was said, once more taking the opportunity to affirm shared values.

#### 6. Wrap-up

Finally, the canvasser would circle back to the question they began with: how permissible abortion should be on a scale from 0-10. The canvasser would ask them to reflect upon and explain any changes. Then they would also ask voters who identified themselves as supporters of abortion access to contact their elected officials to state that support.

### Placebo Procedure

The sole purpose of the placebo conversations was to identify voters who were home and thus voters with whom the intervention could be plausibly attempted (see section entitled "Placebo Design"). When approaching homes where subjects were in the placebo group, canvassers followed the following procedure instead:

1. **Determine if voter is home.** The canvasser knocks on the door and says, "Hi, are you [name]?" If the subject identifies themself, the canvasser marks "Voter came to door" on their walk list. This leads the voter to be targeted for resurveying and coded as "canvassed" in our data. Note that this first step is identical in the placebo and treatment condition.
2. **Placebo begins.** The canvasser would then deliver the placebo survey.
3. **Conversation ends.** The canvasser thanks the subject and leaves.

### Scripts

Below are the scripts for the Treatment and Placebo conditions.

\clearpage

\includepdf[pages={-}]{Treatment.pdf}
\includepdf[pages={-}]{Placebo.pdf}


# Survey Recruitment Procedures and Experimental Design

In this section we describe the survey recruitment procedures and the experimental design. We assess the representativeness of the sample at each step and test design assumptions in other sections.

### Baseline Survey

To measure the effects of the intervention, we conducted ostensibly unrelated surveys of voters living in Maine. To recruit voters to these surveys, the partner organization first provided us with contact information for voters living in the areas they planned to canvass, acquired from the publicly available list of registered voters. When selecting voters whose contact information to provide us with, the organization selected broadly, only removing voters with an out-of-state mailing address, a seasonal mailing address, a small number of voters who it expected were staunch opponents or supporters of abortion based on proprietary analytical models or prior contact with the organization. Voters living in Portland or South Portland were removed. The study was limited to voters living in Androscoggin, Cumberland and York Counties in Maine. We invited these voters to the baseline survey by mail. The survey was called the "Maine Opinion Study". (See more detail in "Additional Survey Details" section.)

The recruitment letter included the survey web URL, a unique login for each voter, and instructions for taking the survey online. To participate, respondents entered the URL from the letter in their computer or smartphone and then their unique login. We mailed letters to `r length(unique(data$hh_id))` households that contained individual logins for `r nrow(data)` people; when multiple eligible voters lived in the same household, we sent the household one letter that contained a unique login for each person.

Voters were offered no incentives for completing the baseline survey but were offered \$5 for completing each follow-up survey. Voters received these incentives via email (collected during the survey) immediately upon completion of the survey. Voters could redeem these \$5 incentives as gift-cards to Amazon, iTunes, Starbucks, WalMart, or Home Depot or as donations to Habitat for Humanity, the National Parks Foundation, or Clean Water Fund.

We also conducted a second baseline survey to further improve precision. If subjects did not respond to this second baseline survey, mean values were imputed.

### Random Assignment of Households

`r sum(data$t0_respondent == 1, na.rm = TRUE)` voters completed at least one baseline survey and provided a valid email address. 7 voters who completed this survey were removed from the experiment at this point because they had moved or had requested to be removed from the Maine Opinion Study. This left `r sum(!is.na(data$treat))` voters to be randomized. We randomly assigned half to treatment and half to placebo. Voters were randomly assigned at the household level, ensuring that multiple voters who completed the pre-survey within the same household were always assigned to the same treatment condition. All analyses adjust standard errors to account for this clustered assignment (see details below). This procedure is identical to that used in \cite{broockman2016durably} and follows the best practices for field experiments with survey outcomes \cite{broockman2017design}.

The household-level clustered random assignment took place within blocks of two households. These blocks were formed by matching households on household size, whether or not the household responded to the second baseline survey, and on household-level-average factor of abortion views. Within each block, one household (cluster) was assigned to each condition. This pre-treatment blocking reduces the chance of imbalance between conditions and improves precision \cite{broockman2017design}.

### Random Assignment of Turfs

On the day of each canvass, groups of households were formed into "turfs" by the staff at the partner organization. "Turfs" are groups of nearby households convenient for two canvassers to visit by walking a short distance. Households were put in groups blind to treatment assignment and simply based on the geographic layout of households to be canvassed that day. A route connecting the households in the turf were then drawn, again blind to treatment assignment, such that an efficient route could be followed; half of the households were marked for a Canvasser A and half for a Canvasser B in an order each canvasser could follow. The groups of households (turf) were then randomly assigned to pairs of canvassers by having canvassers pick a number corresponding to a turf out of a hat. Then, canvass leaders flipped a coin to determine which canvasser would knock on A doors and which on B doors. In some cases, Canvasser A and B would be one person. Data-quality checks conducted after the canvass ensured that canvassers all properly canvassed the assigned doors within their turf. This random assignment of canvassers to turf allows us to assess canvasser-level treatment effect heterogeneity, such as by canvasser gender.

### Placebo Design for Delivering Intervention

Canvassers attempted to have a conversation about an unrelated issue (Medicaid expansion) with voters in the placebo group. This placebo-controlled experimental design \cite{nickerson2005scalable} is common in studies of door-to-door canvassing interventions and field experiments more generally \cite{broockman2017design}. Nickerson \cite{nickerson2005scalable} summarizes the placebo design:

> Rather than rely upon a control group that receives no attempted treatment, the group receiving the placebo can serve as the baseline for comparison for the treatment group...assuming that (1) the two treatments have identical compliance profiles; (2) the placebo does not affect the dependent variable; and (3) the same type of person drops out of the experiment for the two groups.

Gerber et al. \cite{gerber2010baseline} similarly summarize the design:

> subjects who agree to participate in a study and for whom the prospect of treatment is imminent are randomly assigned to receive either the treatment or the placebo.

The sole purpose of the placebo discussion was to identify subjects who were home and thus with whom a conversation at the door could be attempted (versus subjects who were not home at all or would not even open the door). Identifying this group allows a direct comparison of subjects with whom the intervention actually began to subjects with whom the intervention could have begun but did not because of their random assignment (and thus with whom a conversation about Medicaid expansion began instead). This design dramatically improves the precision of door-to-door canvassing experiments \cite{broockman2017design}.

We implemented the placebo design as follows.

First, the canvassers began by implementing an identical procedure regardless of experimental condition. Canvassers were given walk lists of voters to contact that had been sequentially ordered by voters' addresses blind to voters' treatment assignment. Canvassers proceeded down the list of houses in the experiment in this order, knocking on one door after another without regard to the household's experimental group. The beginning of the conversation was also identical in each condition: "Hi, are you [name]?" If the subject identified him/herself or came to the door at this point, the canvasser then checked a box called "Voter came to door" on their walk list. The experimental sample consists of those who came to the door at this point.

Only after canvassers determined whether the voter they were looking for came to the door or not did they begin either implementing the intervention or delivering the placebo. Importantly, nothing was different in the procedure before this point: voters did not know the canvasser intended to have a conversation with them about abortion or Medicaid expansion before identifying themselves or not; canvassers did not inform voters about the topic of the conversation before this point.

These procedures guarantee an unbiased experimental comparison among voters who came to the door and then were delivered the intervention or were then not delivered the intervention based on their random assignment \cite{nickerson2005scalable,broockman2017design}.

One strength of this study's research design is that we are able to sensitively test the placebo design's key assumption: the kinds of voters who identify themselves at their doors before the placebo starts and before the intervention starts are similar. Our tests support this assumption. We describe these tests in the *Tests of Design Assumptions* section.

### Follow-Up Surveys

Following the placebo design described above, we conducted multiple waves of follow-up surveys for voters who came to the door in any condition. These follow-up surveys began around 1 week, 1 month, and 3 months after the day each voter was canvassed. We solicited voters to complete these re-surveys at the email addresses they provided in the baseline survey. Three reminders to complete the follow-up surveys were sent for each survey wave.

Note that to the extent any voters answered the wrong surveys or did not answer the surveys carefully, this measurement error would lead us to underestimate the true effects of treatment \citep{gerber2012field}.

### Additional Survey Details

The survey was called the "Maine Opinion Study, a study being conducted at University #1 and University #2 of registered voters from across Maine." The survey was conducted by the authors using a panel initially recruited through the mail and then managed using Qualtrics via e-mail, using the e-mail addresses subjects provided us.

The population refers to registered voters in selected neighborhoods in Maine, as chosen by staff at the partner organization. Voters were recruited from this population by mail we sent to their household.

The below table shows how the representativeness of those who responded to the survey differ from those mailed an invitation to participate in the survey. These data come from the voter file. Note that no weighting is used in the analysis; the aim of the estimation is to test for the existence of treatment effects within this sample, not to generalize to the population of invited respondents.


```{r eval=TRUE, echo=FALSE}
assess.representativeness <- function(df) c(mean(df$vf_female), mean(df$vf_age),
                                            mean(df$vf_white), nrow(df))

starting.universe <- t(c("Starting", assess.representativeness(data)))
t0.sample <- t(c("Baseline Resp.", assess.representativeness(subset(data, data$t0_respondent == 1))))
t1.sample <- t(c("2nd Baseline Resp.", assess.representativeness(subset(data, data$t1_respondent == 1))))
canvassed.sample <- t(c("Canvassed", assess.representativeness(subset(data, data$canvassed == 1))))
t2.sample <- t(c("1 Wk Resp.", assess.representativeness(subset(data, data$t2_respondent == 1))))
t3.sample <- t(c("1 Mo Resp.", assess.representativeness(subset(data, data$t3_respondent == 1))))
t4.sample <- t(c("3 Mo Resp.", assess.representativeness(subset(data, data$t4_respondent == 1))))

representativeness <- rbind(starting.universe, t0.sample, t1.sample, canvassed.sample, 
                            t2.sample, t3.sample, t4.sample)
colnames(representativeness) <- c("Sample", "Female", "Age", "White", "N")
representativeness[, 2:4] <- round(as.numeric(representativeness[, 2:4]), 2)

kable(representativeness, caption = "Representativeness of Experiment at Each Stage")  %>%
  kable_styling(latex_options = c("striped", "hold_position"))
```

# Outcomes

The survey included dozens of political, social, and cultural questions, only some of which were related to abortion In our pre-analysis plan, we indicate which items constituted experimental outcomes. Below we list these items and give their full text.

The below items appeared on multiple surveys; the # sign below will be replaced with the survey number in our analysis:

* The baseline survey is survey 0;
* The second baseline survey is survey 1;
* the 1-week survey is survey 2;
* the 1-month survey is survey 3, and;
* the 3 month survey is survey 4.

The variable name for each item is written using `in-line code`. For the remainder of the paper, we will refer to these items by their variable names.

#### Abortion Policy/Legal Scale

The first set of policy/legal questions asked "Do you think abortion should be banned or allowed in the following circumstances?" Respondents were given the response options "Ban abortion" or "Allow abortion".

* `t#_allow_6weeks`: If a woman has already been pregnant for six weeks.
* `t#_allow_trimester`: If a woman has already been pregnant for more than one trimester (more than 12 weeks of pregnancy).
* `t#_allow_no_birthcontrol`: If a woman was not using birth control.
* `t#_allow_partner`: If a woman's partner disagrees with her decision.
* `t#_allow_already`: If a woman has already had an abortion before.
* `t#_allow_anyreason`: If a woman wants an abortion for any reason.

The second set of policy/legal questions asked "How much do you agree or disagree with the following statements regarding abortion?" Respondents were given a 7-point Agree-Disagree scale with a neutral midpoint of "Neither agree nor disagree". 

* `t#_law_appointment`: Maine law should require a woman to attend multiple appointments over several days in order to get an abortion.
* `t#_law_private_insur`: Maine law should prohibit private health insurance from covering abortions.
* `t#_law_counsel`: Maine law should require women to receive counseling that advises childbirth over abortion.
* `t#_law_public_insur`: Maine law should require that publicly-funded health insurance for poor people covers the cost of abortions.

A final policy/legal questions asked "Maine is considering a number of ballot measures during the next few years. Below are some of the ballot measures under consideration. Would you favor or oppose each ballot measure?" Respondents were given a 5-point Favor-Oppose scale with a neutral midpoint of "Neither Favor nor Oppose".

* `t#_ballot_abortion`: Prohibiting doctors from performing abortions on women who are beyond 20 weeks of pregnancy.

#### Abortion Social/Stigma/Moral Scale

The first set of social/stigma/moral questions asked "How much do you agree or disagree with the following statements regarding abortion?" Respondents were given a 7-point Agree-Disagree scale with a neutral midpoint of "Neither agree nor disagree". 

* `t#_abortion_something_wrong`: Women who have had abortions have done something wrong.
* `t#_abortion_badly`: Women who have had abortions should feel badly about themselves.
* `t#_abortion_consider`: If a woman does not want to be pregnant, she should consider an abortion.
* `t#_abortion_birthcontrol`: Given the availability of modern birth control, women who have abortions are just irresponsible.
* `t#_abortion_nothing_wrong`: There's nothing wrong with having an abortion.

In addition, respondents were asked a 0-100 feeling thermometer for "Women who have had abortions." This is coded as `t#_therm_women_abort`.

#### Abortion Action-Taking Scale

The abortion action-taking questions asked "These days people are busy and often don’t have time to do many of the things they would like to. Suppose in the next month someone asked you to engage in the following activities. How likely would you be to say yes?" Respondents were given a 5-point response scale including "Extremely likely", "Very likely", "Somewhat likely", "A little likely", and "Not likely at all".

* `t#_act_volunteer_clinic`: Volunteer for an organization that supports women who need abortions.
* `t#_act_accompany`: Accompany a friend or family member to an abortion appointment.
* `t#_act_protest_clinic`: Protest outside an abortion clinic to show my opposition to abortion.
* `t#_act_congress_support`: Call a politician to express my support for legal and available abortion.
* `t#_act_congress_opposition`: Call a politician to express my opposition to legal and available abortion.

#### Additional Measure: Planend Parenthood Feeling Thermometer

We also asked a feeling thermometer towards Planned Parenthood (coded as `t#_therm_pp`). This was not included in any of the above scales because it did not fit any of the above categories. We asked about Planned Parenthood because they are a commonly-known group in American politics that supports access to abortion.

### Outcome Indices

In our pre-analysis plan, we specified that we would combine multiple items into indices to test hypotheses. Combining outcomes into an index increases precision by decreasing survey measurement error and limits the potential for multiple hypothesis testing \cite{broockman2017design}.

The indices, to be described momentarily, are as follows:

* `t#_factor_policy`: An index of outcomes from Abortion Policy/Legal Scale.
* `t#_factor_stigma`: An index of outcomes from Abortion Social/Stigma/Moral Scale.
* `t#_factor_action`: An index of outcomes from Abortion Action-Taking Scale.

In addition to an outcome index at each time period, we also present the results of a pooled outcome index --- `tALL_factor_` --- taking the average of each outcome index across time periods. If a respondent did not take a particular post-treatment survey, that time period is excluded from the average. Note that this pooled outcome was not pre-specified in our pre-analysis plan. We use this pooled outcome index to increase the precision of our treatment effect estimates through further reductions in measurement error. This pooled outcome is also a useful brief summary of the overall effects. We calculate the pooled outcome using the below Stata code:

~~~
// Generate factor averages for pooling
foreach factor in policy stigma action {
	egen tALL_factor_`factor' = rowmean(t2_factor_`factor' t3_factor_`factor' t4_factor_`factor')
}
~~~

### Procedure for Combining Outcomes into Indices

We pre-specified that we would create the indices by using factor analysis and rescaling the factors to have mean 0 and standard deviation 1.

We use the below Stata code to generate the factors. Note that we code all indices such that higher values on the indices indicate more tolerance and success of the intervention. If a factor is reverse-coded, we multiply by -1 to adjust for this.

~~~
factor [VARIABLES USED], fa(1)
predict t#_[FACTOR NAME]_temp
egen t#_[FACTOR NAME] = std(t#_[FACTOR NAME]_temp) // standardize to mean 0, SD 1
~~~

Note that all indices are coded so that positive values reflect greater support for abortion.

# Estimation Procedures

### Average Treatment Effects

Consistent with our pre-analysis plan, to estimate treatment effects we use ordinary least squares (OLS) regressions with cluster-robust standard errors, clustering on household and also including the pre-treatment covariates from the baseline survey and voter list named in our pre-analysis plan. This procedure and these covariates were pre-specified in advance and produce unbiased estimates of causal effects \cite{gerber2012field,broockman2017design}. Note that there is no reclassification of treatment based on what occurs at the door and we do not exclude any subjects who came to the door; we compare all subjects who came to the door and were pre-assigned to the treatment conversation to all subjects who came to the door and were pre-assigned to the placebo conversation.

```{r, include = FALSE}
t0.covariate.names <- c('t0_ideology', 't0_pid7',
                        't0_allow_6weeks', 't0_allow_trimester',
                        't0_allow_no_birthcontrol', 't0_allow_partner',
                        't0_allow_already', 't0_allow_anyreason',
                        't0_law_appointment', 't0_law_private_insur',
                        't0_law_counsel', 't0_law_public_insur',
                        't0_abortion_something_wrong', 't0_abortion_badly',
                        't0_abortion_consider', 't0_abortion_birthcontrol',
                        't0_abortion_nothing_wrong', 't0_act_volunteer_clinic',
                        't0_act_accompany', 't0_act_protest_clinic',
                        't0_act_congress_support', 't0_act_congress_opposition',
                        't0_ballot_abortion', 't0_therm_pp',
                        't0_therm_women_abort', 't1_law_appointment', 't1_law_public_insur',
                        't1_abortion_badly', 't1_abortion_consider', 't1_abortion_birthcontrol',
                        't1_ballot_abortion', 't1_abortion_always_legal',
                        't1_college_educ', 't1_relig_very_important',
                        't1_pid7', 't1_ideology',
                        't0_therm_lgbt', 't0_therm_pharma',
                        't0_therm_nra', 't0_therm_afam',
                        't0_therm_gun_owners', 'vf_female',
                        'vf_age', 't1_respondent')

x <- data[,c(t0.covariate.names)]
x <- as.matrix(x, dimnames = list(NULL, names(x)))

# Function to compute clustered standard errors, from Mahmood Arai.
cl <- function(fm, cluster){
  M <- length(unique(cluster))
  N <- length(cluster)
  K <- fm$rank
  dfc <- (M/(M-1))*((N-1)/(N-K))
  uj  <- apply(estfun(fm), 2, function(x) tapply(x, cluster, sum))
  vcovCL <- dfc*sandwich(fm, meat=crossprod(uj)/N)
  coeftest(fm, vcovCL)
}

# Function to extract the ATE from OLS with clustered SEs.
est.ate <- function(dv, include.obs = NULL, include.covariates = TRUE){
  if(is.null(include.obs)){
    include.obs <- !is.na(dv) 
  }
  include.obs <- which(include.obs & !is.na(dv))
  
  if(include.covariates){
    lm.obj <- lm(dv[include.obs] ~ data$treat[include.obs] +
                                        x[include.obs,])
    # Calculate cluster-robust standard errors.
    result <- cl(lm.obj, data$hh_id[include.obs])[2,]
    result <- t(data.frame(result))
    names(result) <- c("Effect", "SE", "t-stat", "p")
    rownames(result) <- c("Treat")
    result <- data.frame(result)
  }
  if(!include.covariates){
    lm.obj <- lm(dv[include.obs] ~ data$treat[include.obs])
    # Calculate cluster-robust standard errors.
    result <- cl(lm.obj, data$hh_id[include.obs])[2,]
    result <- t(data.frame(result))
    names(result) <- c("Effect", "SE", "t-stat", "p")
    rownames(result) <- c("Treat")
    result <- data.frame(result)
  }

  return(result)
}
```

### Contact Rate

Contact is defined as the voter coming to the door and being identified before the topics of the placebo or treatment begin. Across the two conditions among voters who responded to the baseline survey and were then randomly assigned to an experimental condition, the contact rates were:

* Placebo: `r round(mean(subset(data, data$treat == 0)$canvassed, na.rm = TRUE), 2)`.
* Treatment: `r round(mean(subset(data, data$treat == 1)$canvassed, na.rm = TRUE), 2)`.

On average, the conversations lasted:

* Placebo: `r round(mean(subset(data, data$treat == 0 & data$canvassed == 1)$length, na.rm = TRUE), 2)` minutes (standard deviation of `r round(sd(subset(data, data$treat == 0 & data$canvassed == 1)$length, na.rm = TRUE), 2)` minutes).
* Treatment: `r round(mean(subset(data, data$treat == 1 & data$canvassed == 1)$length, na.rm = TRUE), 2)` minutes (standard deviation of `r round(sd(subset(data, data$treat == 1 & data$canvassed == 1)$length, na.rm = TRUE), 2)` minutes). Among those who at least provided the first rating, the conversations lasted an average of `r round(mean(subset(data, data$treat == 1 & data$canvassed == 1 & !is.na(data$first_rating == 1))$length, na.rm = TRUE), 2)` minutes and had a standard deviation of `r round(sd(subset(data, data$treat == 1 & data$canvassed == 1 & !is.na(data$first_rating))$length, na.rm = TRUE), 2)` minutes.

### Complier Average Causal Effects

In the pre-analysis plan, we noted that we planned to adjust for the complier average causal effect (CACE) by dividing the average treatment effect (ATE) by the proportion of conversations where the voter answered the first rating question. This CACE assumes that 1) there was no effect of the intervention for the voters who immediately refused to talk, and 2) there are no defiers; that is, no voters only received the intervention if they were assigned to the placebo group yet would not have received it were they actually in the treatment group \cite{gerber2012field}. Reporting these point estimates would not change the experimental comparison we conduct, but would increase point estimates to account for the measurement error in the treatment indicator.

We do not report the CACE estimates due to space constraints, but note that these can be computed by taking the ATE estimates and multiplying by approximately 1.12, given the contact rate of approximately 89\%. That is, given a true effect on compliers of 1.12 times the size of the ATEs we observed, we would on average estimate ATEs of the magnitude that we did.

# Tests of Design Assumptions

### Covariate Balance among All Subjects, Compliers, and Reporters

The below tables demonstrate that balance on pre-treatment observable attributes is maintained among the original universe of pre-survey respondents randomized to each group, the sub-sample that was canvassed, and the sub-sample that was both canvassed and successfully re-interviewed. Each table shows the mean value for the covariate (measured in the baseline survey before treatment) under each condition as well as the *p*-value from a one-way ANOVA test. The first table considers all voters who were randomly assigned after having taken the pre-survey (all subjects); the second table considers all voters who were successfully contacted (compliers); the remaining tables consider all voters who responded to the first through third post-surveys (reporters).

```{r eval=TRUE, echo=FALSE}
balance.vars <- c('t0_abortion_factor', 't0_pid7', 't0_therm_pp',
                  't0_therm_women_abort', 't0_therm_afam',
                  'vf_female', 'vf_age',
                  't1_respondent')
balance.vars.names <- c("Baseline Abortion Factor",
                        "Party ID",
                        "Planned Parenthood Therm",
                        "Women Who Had Abortions Therm",
                        "African American Therm",
                        "Female",
                        "Age",
                        "2nd Baseline Respondent")

make.balance.table <- function(subset, varlist, names, caption){
  file <- subset(data, subset)
  balance <- matrix(ncol=2, nrow=length(varlist)+1)
  for(i in 1:length(varlist)){
    balance[i,1] <- mean(file[file$treat == 0,balance.vars[i]])
    balance[i,2] <- mean(file[file$treat == 1,balance.vars[i]])
  }
  balance[length(varlist)+1,1] <- nrow(file[file$treat==0,])
  balance[length(varlist)+1,2] <- nrow(file[file$treat==1,])
  balance <- data.frame(round(balance, digits=2))
  rownames(balance) <- c(names, "N")
  anova.test.vector <- matrix(ncol=1, nrow=nrow(balance))
  for(i in 1:length(varlist)){
    anova.test.vector[i,1] <- round(summary(aov(file[,varlist[i]] ~ 
                                                  as.factor(treat), data = file))[[1]][["Pr(>F)"]][[1]], 2)
  }
  anova.test.vector[nrow(balance),1] <- "-"
  balance <- cbind(balance,anova.test.vector)
  colnames(balance) <- c("Placebo", "Treat", "p-value")
  return(kable(balance, caption = caption)  %>%
    kable_styling(latex_options = c("striped", "HOLD_position")))
} 
make.balance.table(data$t0_respondent == 1 & !is.na(data$treat), balance.vars, balance.vars.names,
                   "Covariate Balance among Pre-Survey Respondents.")
```

```{r eval=TRUE, echo=FALSE}
make.balance.table(data$canvassed == 1, balance.vars, balance.vars.names,
                   "Covariate Balance among Compliers.")
```

```{r eval=TRUE, echo=FALSE}
make.balance.table(data$t2_respondent == 1, balance.vars, balance.vars.names,
                   "Covariate Balance among 1st Post-Survey Respondents.")
```

```{r eval=TRUE, echo=FALSE}
make.balance.table(data$t3_respondent == 1, balance.vars, balance.vars.names,
                   "Covariate Balance among 2nd Post-Survey Respondents.")
```

```{r eval=TRUE, echo=FALSE}
make.balance.table(data$t4_respondent == 1, balance.vars, balance.vars.names,
                   "Covariate Balance among 3rd Post-Survey Respondents.")
```

### Survey Attrition

An important design assumption is that the treatment does not affect the composition of the individuals who take each follow-up survey \cite{broockman2017design}. We investigate this by regressing an indicator for responding to a post-treatment survey on indicators of treatment assignment. Across the three survey waves, we find no evidence of differential attrition.

```{r eval=TRUE, echo=FALSE}
t2.respondents <- summary(lm(t2_respondent ~ treat, data = data))$coef[2,]
t3.respondents <- summary(lm(t3_respondent ~ treat, data = data))$coef[2,]
t4.respondents <- summary(lm(t4_respondent ~ treat, data = data))$coef[2,]

overall.attrition <- data.frame(round(rbind(t2.respondents, t3.respondents, t4.respondents), 3))
names(overall.attrition) <- c("Effect", "SE", "t.stat", "p")
overall.attrition <- as.matrix(overall.attrition)
rownames(overall.attrition) <- rep(c("Treat"), 3)


kable(overall.attrition, digits=2, caption = "Test for differential attrition") %>%
    pack_rows("1 Week", 1, 1) %>%
    pack_rows("1 Month", 2, 2) %>%
    pack_rows("3 Months", 3, 3) %>%
    kable_styling(latex_options = c("striped", "HOLD_position"))
```

### Test of Differential Attrition by Covariates

The above subsection demonstrated that there was no average differential attrition; now, we test for whether the treatment caused attrition to differ by covariates (for example, whether it encouraged already-supportive subjects to complete the post-survey but also discouraged unsupportive subjects from doing so) \cite{gerber2012field}. To test whether attrition patterns are similar by covariates in treatment and placebo, we use a linear regression of whether or not an individual responded to the follow-up survey on treatment, baseline covariates, and treatment-covariate interactions. We then perform a heteroskedasticity-robust F-test of the hypothesis that all the interaction coefficients are zero. This procedure was pre-specified in our pre-analysis plan and is standard practice \cite{gerber2012field}. Below we report the p-value of this F-test. Based on the results presented in the Table below, there does not appear to be evidence of asymmetrical attrition.

```{r, eval=TRUE, echo=FALSE}
get.F.stat <- function(attrit.indicator){
  reduced.model <- lm(attrit.indicator ~ x + treat, data = data)
  xXt <- matrix(nrow = nrow(x), ncol = ncol(x))
  for(col in 1:ncol(x)) xXt[,col] <- as.numeric(data$treat) * x[,col]
  full.model <- lm(attrit.indicator ~ x + data$treat + xXt)
  return(round(anova(reduced.model, full.model)[2,"Pr(>F)"], 2))
}

attrition <- matrix(, nrow = 3, ncol = 2)

attrition[1,2] <- get.F.stat(data$t2_respondent)
attrition[2,2] <- get.F.stat(data$t3_respondent)
attrition[3,2] <- get.F.stat(data$t4_respondent)

attrition[,1] <- c("1 Week Survey (t2)", "1 Month Survey (t3)", "3 Month Survey (t4)")

kable(attrition, digits = 2, caption = "p-value by Survey Wave Test of Differential Attrition by Covariates.") %>%
  kable_styling(latex_options = c("striped", "HOLD_position"))
```

# Results

Below we report the results in tabular form at each time period and for each outcome measure. In each section, the first table shows the results by the average treatment effect. Each table includes two models: one in which we adjust for the pre-specified pre-treatment covariates to improve precision and a second unadjusted model. Note that we pre-registered a focus on the estimates with covariates (which were also pre-registered) since we expected these to be much more precise; the experimental design was intended to draw significant statistical power from the baseline survey. However, we also present results without covariates for completeness.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate <- function(dv, caption) {
  t2.ate.nocovars <- est.ate(data[,paste0("t2_",dv)], data$t2_respondent==1, include.covariates = FALSE)
  t2.ate.covars <- est.ate(data[,paste0("t2_",dv)], data$t2_respondent==1, include.covariates = TRUE)
  t2 <- cbind(t2.ate.covars, t2.ate.nocovars)
  
  t3.ate.nocovars <- est.ate(data[,paste0("t3_",dv)], data$t3_respondent==1, include.covariates = FALSE)
  t3.ate.covars <- est.ate(data[,paste0("t3_",dv)], data$t3_respondent==1, include.covariates = TRUE)
  t3 <- cbind(t3.ate.covars, t3.ate.nocovars)
  
  t4.ate.nocovars <- est.ate(data[,paste0("t4_",dv)], data$t4_respondent==1, include.covariates = FALSE)
  t4.ate.covars <- est.ate(data[,paste0("t4_",dv)], data$t4_respondent==1, include.covariates = TRUE)
  t4 <- cbind(t4.ate.covars, t4.ate.nocovars)
  
  tALL.ate.nocovars <- est.ate(data[,paste0("tALL_",dv)], include.covariates = FALSE)
  tALL.ate.covars <- est.ate(data[,paste0("tALL_",dv)], include.covariates = TRUE)
  tALL <- cbind(tALL.ate.covars, tALL.ate.nocovars)
  
  overall <- round(rbind(t2, t3, t4, tALL), 4)
  
  names(overall) <- rep(c("Effect", "SE", "t.stat", "p"), 2)
  overall <- as.matrix(overall)
  rownames(overall) <- rep(c("Treat vs. Placebo"), 4)
  return(kable(overall, digits=3, caption = caption) %>%
    add_header_above(c(" " = 1, "With Covariates" = 4, "Without Covariates" = 4)) %>%
    pack_rows("1 Week", 1, 1) %>%
    pack_rows("1 Month", 2, 2) %>%
    pack_rows("3 Months", 3, 3) %>%
    pack_rows("Pooled", 4, 4) %>%
    kable_styling(latex_options = c("striped", "HOLD_position")))
}
```

### Stigma Index

Below we present the ATE on the stigma index.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate("factor_stigma", "ATE effects on stigma index")
```

### Policy Index

Below we present the ATE on the policy index.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate("factor_policy", "ATE effects on policy index")
```

### Action-Taking Index

Below we present the ATE on the action-taking index.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate("factor_action", "ATE effects on action-taking index")
```

### Planned Parenthood Thermometer

Below we present the ATE on the Planned Parenthood feeling thermometer. Like all other items, this outcome is standardized to have mean 0 and standard deviation 1.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate("therm_pp", "ATE effects on Planned Parenthood feeling thermometer")
```


### Heterogeneous Treatment Effects

In this section we present heterogeneous treatment effects by pre-specified subgroups. For each result, we present the conditional average treatment effect, adjusting for covariates, within each subgroup. As the outcome in these analyses, we use the pooled average of the post-treatment surveys.

We pre-specified that we would investigate heterogeneous treatment effects by voter traits. We will investigate these by comparing the ATEs within the pre-specified subgroups. The groups are:

* Voters high in political knowledge (`t0_knowledge_scale`) will be less persuadable.
* Voters in the middle of the abortion scale (`t0_abortion_factor`) will be more persuadable.

For ease of presentation, we analyze these subgroup effects by splitting the scale into thirds and examining treatment effects within each third.

While we did not pre-specify these heterogeneous treatment effects, we also examine how treatment effects vary by canvasser demographics (canvassers completed a demographic survey to gather this data):

* Whether the canvasser is female or not.
* Whether the canvasser has children or not.
* Whether the canvasser has previously had an abortion or not.

#### Canvasser Gender

Below are results whether the canvasser self-reported being female or not. `r round(mean(data$canvasser_female, na.rm = TRUE), 2)*100`% of subjects were canvassed by women.


```{r, eval=TRUE, echo=FALSE}
make.results.table.ate.subgroup.binary <- function(binary, caption) {
  policy.tALL.ate.group1 <- est.ate(data[,paste0("tALL_","factor_policy")],
                           binary == 1, include.covariates = TRUE)
  policy.tALL.ate.group0 <- est.ate(data[,paste0("tALL_","factor_policy")],
                           binary == 0, include.covariates = TRUE)

  policy <- round(cbind(policy.tALL.ate.group1, policy.tALL.ate.group0), 3)
  
  
  stigma.tALL.ate.group1 <- est.ate(data[,paste0("tALL_","factor_stigma")],
                           binary == 1, include.covariates = TRUE)
  stigma.tALL.ate.group0 <- est.ate(data[,paste0("tALL_","factor_stigma")],
                           binary == 0, include.covariates = TRUE)

  stigma <- round(cbind(stigma.tALL.ate.group1, stigma.tALL.ate.group0), 3)
  
  action.tALL.ate.group1 <- est.ate(data[,paste0("tALL_","factor_action")],
                           binary == 1, include.covariates = TRUE)
  action.tALL.ate.group0 <- est.ate(data[,paste0("tALL_","factor_action")],
                           binary == 0, include.covariates = TRUE)

  action <- round(cbind(action.tALL.ate.group1, action.tALL.ate.group0), 3)
  
  overall <- rbind(action, stigma, policy)
  
  names(overall) <- rep(c("Effect", "SE", "t.stat", "p"), 2)
  rownames(overall) <- c("Effect on Action-Taking", "Effect on Stigma", "Effect on Policy")
  overall <- as.matrix(overall)
  
  # header in add_header_above requires this
  group1.title = paste(caption, "= 1")
  group0.title = paste(caption, "= 0")
  myHeader <- c(" " = 1, group1.title = 4, group0.title = 4)
  names(myHeader) <- c(" ", group1.title, group0.title)
  
  return(kable(overall, digits=2, caption = paste("Heterogeneous treatment effects by", caption)) %>%
    add_header_above(c(myHeader)) %>%
    kable_styling(latex_options = c("striped", "HOLD_position")))
}

make.results.table.ate.subgroup.binary(data$canvasser_female, "Canvasser is female")
```

#### Canvasser has children

Below are results whether the canvasser self-reported having children or not. `r round(mean(data$canvasser_children, na.rm = TRUE), 2)*100`% of subjects were canvassed by canvassers with children.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate.subgroup.binary(data$canvasser_children, "Canvasser has children")
```

#### Canvasser has had an abortion

Below are results whether the canvasser self-reported having had an abortion or not. `r round(mean(data$canvasser_abortion, na.rm = TRUE), 2)*100`% of subjects were canvassed by a canvasser who self-reported having had an abortion.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate.subgroup.binary(data$canvasser_abortion, "Canvasser has had an abortion")
```

#### By voter political knowledge

In the baseline survey, we asked four political knowledge questions. Roughly one-third correctly answered 0, 1, or 2 of these questions, one-third correctly answered 3, and one-third answered four. The questions were:

* Do you happen to remember which party controls the United States Senate – that is, which party has a majority of members in the United States Senate?
* How about the Maine House of Representatives? Do you happen to know which party has the most members in the Maine House of Representatives right now?
* Do you happen to remember who is the current US Attorney General?
* Do you happen to remember what industry the Dodd-Frank Act regulates?

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate.subgroup.threegroup <- function(group1, group1.title, 
                                                       group2, group2.title, 
                                                       group3, group3.title, 
                                                       caption) {

  policy.tALL.ate.group1 <- est.ate(data[,paste0("tALL_","factor_policy")],
                           group1 == 1, include.covariates = TRUE)
  policy.tALL.ate.group2 <- est.ate(data[,paste0("tALL_","factor_policy")],
                           group2 == 1, include.covariates = TRUE)
  policy.tALL.ate.group3 <- est.ate(data[,paste0("tALL_","factor_policy")],
                           group3 == 1, include.covariates = TRUE)

  policy <- round(cbind(policy.tALL.ate.group1, policy.tALL.ate.group2,
                        policy.tALL.ate.group3), 3)
  
  
  stigma.tALL.ate.group1 <- est.ate(data[,paste0("tALL_","factor_stigma")],
                           group1 == 1, include.covariates = TRUE)
  stigma.tALL.ate.group2 <- est.ate(data[,paste0("tALL_","factor_stigma")],
                           group2 == 1, include.covariates = TRUE)
  stigma.tALL.ate.group3 <- est.ate(data[,paste0("tALL_","factor_stigma")],
                           group3 == 1, include.covariates = TRUE)

  stigma <- round(cbind(stigma.tALL.ate.group1, stigma.tALL.ate.group2,
                        stigma.tALL.ate.group3), 3)
  
  action.tALL.ate.group1 <- est.ate(data[,paste0("tALL_","factor_action")],
                           group1 == 1, include.covariates = TRUE)
  action.tALL.ate.group2 <- est.ate(data[,paste0("tALL_","factor_action")],
                           group2 == 1, include.covariates = TRUE)
  action.tALL.ate.group3 <- est.ate(data[,paste0("tALL_","factor_action")],
                           group3 == 1, include.covariates = TRUE)

  action <- round(cbind(action.tALL.ate.group1, action.tALL.ate.group2,
                        action.tALL.ate.group3), 3)
  
  overall <- rbind(action, stigma, policy)
  names(overall) <- rep(c("Effect", "SE", "t.stat", "p"), 3)
  rownames(overall) <- c("Effect on Action-Taking", "Effect on Stigma", "Effect on Policy")
  
  # header in add_header_above requires this
  myHeader <- c(" " = 1, group1.title = 4, group2.title = 4, group3.title = 4)
  names(myHeader) <- c(" ", group1.title, group2.title, group3.title)
  
  return(kable(overall, digits=2, caption = paste("Heterogeneous treatment effects by", caption)) %>%
    add_header_above(c(myHeader)) %>%
    kable_styling(latex_options = c("striped", "HOLD_position", "scale_down")))
}

make.results.table.ate.subgroup.threegroup(data$t0_pk_index_lowthird, "0, 1, 2 correct",
                                           data$t0_pk_index_midthird, "3 correct",
                                           data$t0_pk_index_highthird, "4 correct", "political knowledge")
```

#### By voter baseline support

Below are the results by the baseline support of the voter. This is based on an index combining all of the abortion policy and prejudice questions from the baseline survey. We divide this index into terciles and report results for each tercile.

To better contextualize these terciles, we provide breakdowns of baseline attitudes across each of the terciles:

| Tercile          | % Democratic at Baseline | Average “Women who have had abortions” Feeling Thermometer at Baseline | % “Not at all likely” to volunteer for an organization that supports women who need abortions |
|------------------|--------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------|
| Least Supportive | 3%                       | 46                                                                     | 95%                                                                                           |
| Mid Supportive   | 11%                      | 58                                                                     | 74%                                                                                           |
| Most Supportive  | 24%                      | 75                                                                     | 44%                                                                                           |

```{r, eval=TRUE, echo=FALSE}
data$t0_least_supportive <- ifelse(data$t0_baseline_support_thirds == 1, 1, 0)
data$t0_mid_supportive <- ifelse(data$t0_baseline_support_thirds == 2, 1, 0)
data$t0_most_supportive <- ifelse(data$t0_baseline_support_thirds == 3, 1, 0)
make.results.table.ate.subgroup.threegroup(data$t0_least_supportive, "Least Supportive",
                                           data$t0_mid_supportive, "Mid Supportive",
                                           data$t0_most_supportive, "Most Suppotive", "abortion support in baseline survey")
```

#### By voter party

Below are the results by the party of the voter. We compare self-identified Democrats to Republicans to Independents (including leaners), as based on responses to the baseline survey. This was not mentioned in our pre-analysis plan, but given the partisan nature of abortion as an issue, we choose to report it as an exploratory result.


```{r, eval=TRUE, echo=FALSE}
make.results.table.ate.subgroup.threegroup(data$t0_dem, "Democrat",
                                           data$t0_rep, "Republican",
                                           data$t0_indep, "Indep/Other", "voter party in baseline survey")
```

#### By voter gender

Below are results whether the voter is listed on the voter file as being female or not. This was not mentioned in our pre-analysis plan, but was suggested by a reviewer.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate.subgroup.binary(data$vf_female, "Voter is female")
```

#### By voter education

Below are results whether the voter self-reported having a college degree or not. This was not mentioned in our pre-analysis plan, but was suggested by a reviewer.

```{r, eval=TRUE, echo=FALSE}
make.results.table.ate.subgroup.binary(data$t1_college_educ, "Voter is college educated")
```

### Benchmarking Results

It may be difficult for readers to interpret the magnitude of an effect presented in terms of standard deviation change. We therefore take two, non-pre-registered approaches to help communicate the substantive size of our estimates.

#### Strong Support

First, one way to make the results more interpretable is to examine treatment effects on whether participants said they **strongly supported** (provided the highest level of agreement) the policies, sentiments, and actions asked about in the surveys. This attempts to recreate how participants might vote on each proposal if faced with a ballot measure or faced with a decision. Note that we did not pre-specify this benchmarking procedure. We use this to illustrate the magnitude of our findings. Note that, in parentheses, P, S, and A refer to the policy, stigma, and action-taking items. An asterisk is used when the predicted effect should be negative. 

The results on the individual dichotomized items are as follows:

```{r, echo=FALSE}
get.t2.itts <- function(dv) {
  t2.ate.covars <- est.ate(data[,dv], data$t2_respondent==1, include.covariates = TRUE)
  return(rbind(t2.ate.covars))
}

overall <- rbind(get.t2.itts('t2_law_appointment_max'),
                 get.t2.itts('t2_law_private_insur_max'),
                 get.t2.itts('t2_law_counsel_max'),
                 get.t2.itts('t2_law_public_insur_max'),
                 get.t2.itts('t2_ballot_abortion_max'),
                 get.t2.itts('t2_abortion_something_wrong_max'),
                 get.t2.itts('t2_abortion_badly_max'),
                 get.t2.itts('t2_abortion_consider_max'),
                 get.t2.itts('t2_abortion_birthcontrol_max'),
                 get.t2.itts('t2_abortion_nothing_wrong_max'),
                 get.t2.itts('t2_act_volunteer_clinic_max'),
                 get.t2.itts('t2_act_accompany_max'),
                 get.t2.itts('t2_act_protest_clinic_max'),
                 get.t2.itts('t2_act_congress_support_max'),
                 get.t2.itts('t2_act_congress_opposition_max'))
  
names(overall) <- rep(c("Effect", "SE", "t.stat", "p"), 1)
overall <- as.matrix(overall)
rownames(overall) <- c(
  'law_appointment (P*)',
  'law_private_insur(P*)',
  'law_counsel (P*)',
  'law_public_insur (P)',
  'ballot_abortion (P*)',
  'abortion_something_wrong (S*)',
  'abortion_badly (S*)',
  'abortion_consider (S)',
  'abortion_birthcontrol (S*)',
  'abortion_nothing_wrong (S)',
  'act_volunteer_clinic (A)',
  'act_accompany (A)',
  'act_protest_clinic (A*)',
  'act_congress_support (A)',
  'act_congress_opposition (A*)'
)

kable(overall, digits=3, caption = 'At max at first post-treatment') %>%
  add_header_above(c(" " = 1, "With Covariates" = 4)) %>%
  kable_styling(latex_options = c("striped", "HOLD_position"))
```

#### Any Support

Second, we also conduct a version of this benchmarking where we dichotomize each variable to record whether participants registered any support (not only strong agreement). These are coded to 1 if a participant agreed at all with the policy/statement and 0 otherwise (indicating stated indifference or opposition). For the action-taking items, we examine cases where the respondent was either extremely or very likely. The results on the individual items dichotimized in this manner are as follows:

```{r, eval=TRUE, echo=FALSE}

overall <- rbind(get.t2.itts('t2_law_appointment_pos'),
                 get.t2.itts('t2_law_private_insur_pos'),
                 get.t2.itts('t2_law_counsel_pos'),
                 get.t2.itts('t2_law_public_insur_pos'),
                 get.t2.itts('t2_ballot_abortion_pos'),
                 get.t2.itts('t2_abortion_something_wrong_pos'),
                 get.t2.itts('t2_abortion_badly_pos'),
                 get.t2.itts('t2_abortion_consider_pos'),
                 get.t2.itts('t2_abortion_birthcontrol_pos'),
                 get.t2.itts('t2_abortion_nothing_wrong_pos'),
                 get.t2.itts('t2_act_volunteer_clinic_pos'),
                 get.t2.itts('t2_act_accompany_pos'),
                 get.t2.itts('t2_act_protest_clinic_pos'),
                 get.t2.itts('t2_act_congress_support_pos'),
                 get.t2.itts('t2_act_congress_opposition_pos'))
  
names(overall) <- rep(c("Effect", "SE", "t.stat", "p"), 1)
overall <- as.matrix(overall)
rownames(overall) <- c(
  'law_appointment (P*)',
  'law_private_insur(P*)',
  'law_counsel (P*)',
  'law_public_insur (P)',
  'ballot_abortion (P*)',
  'abortion_something_wrong (S*)',
  'abortion_badly (S*)',
  'abortion_consider (S)',
  'abortion_birthcontrol (S*)',
  'abortion_nothing_wrong (S)',
  'act_volunteer_clinic (A)',
  'act_accompany (A)',
  'act_protest_clinic (A*)',
  'act_congress_support (A)',
  'act_congress_opposition (A*)'
)

kable(overall, digits=3, caption = 'Agree at all at first post-treatment') %>%
  add_header_above(c(" " = 1, "With Covariates" = 4)) %>%
  kable_styling(latex_options = c("striped", "HOLD_position"))
```

### Results with Weights

To assess the generalizability of our results, we compare our main results -- a sample average treatment effect (SATE) -- to an estimate of the population average treatment effect (PATE). As \cite{miratrix2018worth} note, "The PATE can only be different from the SATE when two things hold: (1) there is meaningful variation in the treatment impact, and (2) that variation is correlated with the weights...It is important to compare the PATE and SATE estimates. A meaningful discrepancy between them is a signal to look for treatment effect heterogeneity and a flag that weight misspecification could be a real concern. If the estimates do not differ, however, and there is no other evidence of heterogeneity, then extrapolation is less of a concern -- and furthermore the SATE is probably a sufficient estimate for the PATE."

To estimate the PATE, we first construct weights of who was canvassed and took the survey relative to the starting universe. We construct these weights using entropy balancing \cite{hainmueller2012entropy} and weight on gender and age (we do not have vote history and the universe is overwhelmingly white; we have no other covariates on the starting universe).

Below are results with and without these weights, showing that the estimated SATEs and PATEs are similar.

Note that this analysis was not pre-registered but was prompted by feedback on the draft version of the paper.

```{r, eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE, results='hide', include=FALSE}
calculate.weighted.ate <- function(survey_respondent, dv) {
  
  # Get the data in shape
  weighting.data <- data %>%
  filter(survey_respondent == 1) %>%
  mutate(Baseline = 0) %>%
  rbind.data.frame(mutate(data, Baseline = 1))
  
  # Calculate weights using ebal,
  # that implements entropy balancing for causal effects 
  #as described in Hainmueller (2012).
  
  X.ebal <- select(weighting.data,
                   vf_female, vf_age)
  weight <- ebalance(weighting.data$Baseline, X.ebal)$w
  weighted.data <- filter(weighting.data, Baseline == 0)
  weighted.data$weight <- weight / mean(weight)
  
  data <- data %>%
    join(select(weighted.data, id, weight))
  
  include.obs <- which(!is.na(dv))
  
  unweighted <- cl(lm(
    dv[include.obs] ~ data$treat[include.obs] +
                 x[include.obs,]),
     data$hh_id[include.obs])[2,]
  
  weighted <- cl(lm(
    dv[include.obs] ~ data$treat[include.obs] +
                 x[include.obs,],
    weights = data$weight[include.obs]),
     data$hh_id[include.obs])[2,]

  return(t(cbind(unweighted, weighted)))
}

t2.policy <- calculate.weighted.ate(data$t2_respondent, data$t2_factor_policy)
t3.policy <- calculate.weighted.ate(data$t3_respondent, data$t3_factor_policy)
t4.policy <- calculate.weighted.ate(data$t4_respondent, data$t4_factor_policy)

t2.action <- calculate.weighted.ate(data$t2_respondent, data$t2_factor_action)
t3.action <- calculate.weighted.ate(data$t3_respondent, data$t3_factor_action)
t4.action <- calculate.weighted.ate(data$t4_respondent, data$t4_factor_action)

t2.stigma <- calculate.weighted.ate(data$t2_respondent, data$t2_factor_stigma)
t3.stigma <- calculate.weighted.ate(data$t3_respondent, data$t3_factor_stigma)
t4.stigma <- calculate.weighted.ate(data$t4_respondent, data$t4_factor_stigma)
```

```{r, eval=TRUE, echo=FALSE, warning=FALSE, message=FALSE}
overall <- data.frame(rbind(t2.policy, t3.policy, t4.policy,
                            t2.stigma, t3.stigma, t4.stigma,
                            t2.action, t3.action, t4.action))
names(overall) <- c("Effect", "SE", "t.stat", "p")
overall <- as.matrix(overall)
rownames(overall) <- rep(c("Unweighted - 1 Wk", "Weighted - 1 Wk",
                           "Unweighted - 1 Mo", "Weighted - 1 Mo",
                           "Unweighted - 3 Mo", "Weighted - 3 Mo"), 3)
  
kable(overall, digits=3, caption = "ATE Effects with Weights") %>%
    pack_rows("Policy", 1, 6) %>%
    pack_rows("Stigma", 7, 12) %>%
    pack_rows("Action", 13, 18) %>%
    kable_styling(latex_options = c("striped", "HOLD_position"))

```

### Comparison with Broockman, Kalla, and Sekhon (2017)

This study and \cite{broockman2017design} have very different populations. For example, this study is nearly 100% white while \cite{broockman2017design} is around 46% white. This could potentially explain the differences in the findings between the two studies. To address this concern, we re-analyzed \cite{broockman2017design} by re-weighting it to match the demographics of the current study. In particular, we used entropy balancing \cite{hainmueller2012entropy} and race, gender, age, partisanship, and ideology. On the All DVs t1 outcome, \cite{broockman2017design} report (see their Table OA4, Model 1, Row 1) an effect of -0.01 (SE = 0.04). In our re-analysis with re-weighting, we estimate an effect of 0.003 (SE = 0.05). After re-weighting \cite{broockman2017design} to match the demographics of the current study, we continue to find a null effect in \cite{broockman2017design}.[^R]

[^R]: We thank an anonymous reviewer for this helpful suggestion.


```{r, include=FALSE, echo=FALSE}
# FIGURE CODE

t2 <- rbind(
  cbind(data.frame(est.ate(data$t2_factor_policy, data$t2_respondent==1)), 
        dv = "Policy Index", time = "1 Week"),
  cbind(data.frame(est.ate(data$t2_factor_stigma, data$t2_respondent==1)), 
        dv = "Stigma Index", time = "1 Week"),
  cbind(data.frame(est.ate(data$t2_factor_action, data$t2_respondent==1)), 
        dv = "Action Index", time = "1 Week"),
  cbind(data.frame(est.ate(data$t2_therm_pp, data$t2_respondent==1)), 
        dv = "Planned Parenthood Therm", time = "1 Week")
)

t3 <- rbind(
  cbind(data.frame(est.ate(data$t3_factor_policy, data$t3_respondent==1)), 
        dv = "Policy Index", time = "1 Month"),
  cbind(data.frame(est.ate(data$t3_factor_stigma, data$t3_respondent==1)), 
        dv = "Stigma Index", time = "1 Month"),
  cbind(data.frame(est.ate(data$t3_factor_action, data$t3_respondent==1)), 
        dv = "Action Index", time = "1 Month"),
  cbind(data.frame(est.ate(data$t3_therm_pp, data$t3_respondent==1)), 
        dv = "Planned Parenthood Therm", time = "1 Month")
)

t4 <- rbind(
  cbind(data.frame(est.ate(data$t4_factor_policy, data$t4_respondent==1)), 
        dv = "Policy Index", time = "3 Months"),
  cbind(data.frame(est.ate(data$t4_factor_stigma, data$t4_respondent==1)), 
        dv = "Stigma Index", time = "3 Months"),
  cbind(data.frame(est.ate(data$t4_factor_action, data$t4_respondent==1)), 
        dv = "Action Index", time = "3 Months"),
  cbind(data.frame(est.ate(data$t4_therm_pp, data$t4_respondent==1)), 
        dv = "Planned Parenthood Therm", time = "3 Months")
)


tALL <- rbind(
  cbind(data.frame(est.ate(data$tALL_factor_policy)), 
        dv = "Policy Index", time = "Pooled"),
  cbind(data.frame(est.ate(data$tALL_factor_stigma)), 
        dv = "Stigma Index", time = "Pooled"),
  cbind(data.frame(est.ate(data$tALL_factor_action)), 
        dv = "Action Index", time = "Pooled"),
  cbind(data.frame(est.ate(data$tALL_therm_pp)), 
        dv = "Planned Parenthood Therm", time = "Pooled")
)


results <- rbind(t2, t3, t4, tALL)
names(results)[1:4] <- c("Effect", "SE", "t", "p")


# Order time
results$time=factor(results$time,levels=c('Pooled', '3 Months', '1 Month', '1 Week'))
results$dv=factor(results$dv,levels=c('Stigma Index', 'Planned Parenthood Therm', 'Policy Index', 'Action Index'))

graph <- ggplot(results, 
                     aes(x=dv, y=Effect, color=time)) +
  geom_point(aes(shape=time, color=time), position=position_dodge(width=0.5)) +
  geom_errorbar(width=.1, 
                aes(ymin = Effect - 1.96 * SE, ymax = Effect + 1.96 * SE, color=time), 
                position=position_dodge(width=0.5)) + 
  geom_errorbar(width=0, size = 1,
                aes(ymin = Effect - SE, ymax = Effect + SE, color=time), 
                position=position_dodge(width=0.5)) + 
  xlab("") +
  ylab("Treatment Effect in Standard Deviations") +
  ggtitle("Treatment Effect vs. Placebo") +
  geom_hline(yintercept=0, linetype = "dotted") + 
  theme_bw() +
  coord_flip() +
  guides(colour = guide_legend(reverse=T), 
         fill = guide_legend(reverse=T),
         shape = guide_legend(reverse=T)) +
  theme(legend.title = element_blank(),
        text = element_text(size=13))

# Save the graph
ggsave('e1_results_graph.pdf', graph, width = 7.5, height = 3.5, units = 'in', scale = 1)
```
